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Detection of a Dark Substructure through Gravitational Imaging 
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ABSTRACT 

We report the detection of a dark substructure - undetected in the HST-ACS F8 14W image - in 
the gravitational lens galaxy SDSSJ0946+1006 (the "Double Einstein Ring"), through direct 
gravitational imaging. The lens galaxy is of particular interest because of its relative high in- 
ferred fraction of dark matter inside the effective radius. The detection is based on a Bayesian 
grid reconstruction of the two-dimensional surface density of the galaxy inside an annulus 
around its Einstein radius. The detection of a small mass concentration in the surface density 
maps has a strong statistical significance. We confirm this detection by modeling the substruc- 
ture with a tidally truncated pseudo-Jaffe density profile; in that case the substructure mass is 
M sub = (3.51+O.15)xlO 9 M , located at (-0.651 ±0.038, 1.040+0.034)", precisely where also 
the surface density map shows a strong convergence peak (Bayes factor Alog£ = -128.0; 
equivalent to a ~l6-cr detection). We set a lower limit of (M/L) V o > 120 M /Ly, (3-cr) in- 
side a sphere of 0.3 kpc centred on the substructure (r t ;dai=l-l kpc). The result is robust under 
substantial changes in the model and the data-set (e.g. PSF, pixel number and scale, source and 
potential regularization, rotations and galaxy subtraction). It can therefore not be attributed to 
obvious systematic effects. Our detection implies a dark matter mass fraction at the radius of 
the inner Einstein ring of /cdm = 2.15^ 25 percent (68% C.L) in the mass range 4 x 1O 6 M to 
4 x 1O 9 M assuming a - 1.9 + 0.1 (with dN/dm cc nr"). Assuming a flat prior on a, between 
1.0 and 3.0, increases this to /cdm = 2.56*j?g percent (68% C.L). The likelihood ratio is 0.51 
between our best value (/cdm = 0.0215) and that from simulations (/j m w 0.003). Hence the 
inferred mass fraction, admittedly based on a single lens system, is large but still consistent 
with predictions. We expect to further tighten the substructure mass function (both fraction 
and slope), using the large number of systems found by SLACS and other surveys. 
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1 INTRODUCTION 

In the process of building a coherent picture of galaxy formation 
and evolution, early-type galaxies play a crucial role. Often un- 
fairly referred to as red and dead objects, many aspects about 
their structure and formation are still unknown. What is the ori- 
gin of the tight empirical relations between their global properties 
i Diorgovski & Davisl fl987l: iDressler et al.l 1 1987NMagorrian et al ' 



19981 : iFerrarese & Merrittll200d:lGebhardt et al]200Cl : Bower etal 
19921 : iGuzman et al.ll 19921 : iBender et alJI 19931) ? How do massive 
early-type assemble? What is the fraction of mass substructure 
populating the haloes of earl y-type galaxies and is this in agree- 
ment with the CDM parad i gm dKauffmann et alJl993l:lMoore et al .1 



1999 ]: iKlypin et al.l 1 19991"; iMoore et alj 12001 



20061 : iDiemand et al.ll2008l : ISpringel et alj2008l : IXu et al.ll2009h 



Maccio & Miranda 



Gravitational lensing, especially in combination with other 
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techniques, provides an invaluable a nd sometimes unique in- 
sight in answering these questions (e.g. Rusin & Kochanekll2005l : 
iTreu & Koopmansi 120041 : iKoopmans et alj l2009l and references 
therein). 

At the level of small mass structure lensing stands out as a 
unique investigative method; different aspects of the lensed im- 
ages can be analysed to extract information about the clumpy 
component of galactic haloes. Flux ratio anomalies, astromet- 
ric perturbations and time-delays, in multiple images of lensed 
quasars, can all be related to substructure at scales smaller 
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As described by iKoopmansI {2005) and Vegetti & Koopmansi 
J2009j) . the information contained in multiple images and Einstein 
rings of extended sources, can also be used. While the former three 
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approaches only provide a statistical measure of the lens dumpi- 
ness, the latter allows one to identify and quantify of each single 
substructure, measuring for each of them the mass and the position 
on the lens plane. Both approaches are however complementary in 
that the former is more sensitive to low-mass perturbations, which 
are potentially present in large numbers, whereas the latter is sen- 
sitive to the rarer larger scale perturbations. 

The method of direct gravitational imaging of the lens poten- 
tial - shortly described in the following section - represents an ob- 
jective approach to detect dark and luminous substructures in in- 
dividual lens systems and allows on to statistically constrain the 
fraction of galactic sa tellites in early-type galaxies. Extensively de- 
scribed and tested in] Vegetti & Koopmansl j2009ah . the procedure 
is here applied to th e study of the double ring SDSSJ0946+1006 
dGavazzi et al .120081) from the sample of the Sloan Lens ACS Suwey 
(SLACS), yielding the first detection of a dwarf satellite through its 
gravitational effect only, beyond the Local Universe. 

The layout of the paper is as follows: in Section 2 we provide 
a general description of the method. In Section 3 we introduce the 
analysed data and in Section 4 we discuss the results of the mod- 
elling under the assumption of a smooth lens potential. In Section 
5 we describe the detection of the substructure. In Section 6 we 
present the error analysis and the model ranking. In Section 7 we 
discuss implication for the CDM paradigm and in Section 8 we 
conclude. Through out the paper we assume the following cosmo- 
logical parameters with H = 73 kms -1 Mpc~', £2 m = 0.25 and 
n A = 0.75. 



2 THE METHOD 

In this section we provide a short introduction to the lens modelling 
method. The main idea behind the method of "gravitational imag- 
ing" is that effects related to the presence of dwarf satellites and/or 
CDM substructures in a lens galaxy can be modelled as local per- 
turbations of the lens potential and that the total potential can be 
described as the sum of a smooth parametric component with lin- 
ear corr ections defined on a grid. We refer to I Vegetti & Koopmansl 
J2009j) for a more complete discussion. 



2.1 Source and potential reconstruction 



As shown in 



Blandford et"al](l200ll).lKoopmansl ( [2005l) . ISuvu et ail 



J2006I) and I Vegetti & Koopmansl j2009ah . it is possible to express 
the relation between perturbations in the lensed data (Sd; i.e. per- 
turbations of the surface brightness distribution of the lensed im- 
ages), the unknown source surface brightness distribution (s) and 
perturbations in the lens potential (Si//) as a set of linear equations 
Sd = -V.v ■ VStft. Through the Poisson equation Sift can be turned 
into a relation with the convergence correction Sk = V 2 Sif//2. 

For a fixed form of the lensing potential and regularization, 
the inversion of these equations leads to the simultaneous recon- 
struction of the source and a potential correction. The source grid 
is defined by a Delaunay tessellation which automatically concen- 
trates the computational effort in high magnification regions while 
keeping the number of degrees of freedom constant, which is criti- 
cal in assessin g the Bayesian posterior prob ability and evidence for 
the model (see lVegetti & Koopma ns 2009a). The procedure is em- 
bedded in the framework of Bayesian statistics which allows us to 
determine the best set of non-linear parameters for a given potential 
and the linear parameters of the source, to objectively set the level 
of regularization and to compare different model families dMacKavl 



ll992U2003l ; ISuvuet a l. 2006; Brewer & Lewis 200(3) . Specifically, 
for a particular lens system we wish to objectively assess whether it 
can be reproduced with a smooth potential or whether mass struc- 
ture on smaller scales has to be included in the model. 

The modelling is performed via a four steps procedure: (i) 
We start by choosing a form for the parametric smooth lens den- 
sity profile, generally an elliptical power-law, and we determine 
the non-linear parameters and level of source regularisation that 
maximize the Bayesian evidence, through a non-linear optimiza- 
tion scheme, (ii) In the case that this model is too simple and sig- 
nificant image residuals are left, we allow for grid-based poten- 
tial corrections. This leads to the initial detection and localization 
of possible substructures, (iii) The substructure masses and posi- 
tions are then more precisely qu antified by assuming a tida lly trun- 
cated pseudo-JafFe (PJ) profile jDalal & Kochanekl l2002n and by 
simultaneously optimising for the main lens galaxy and substruc- 
ture parameters, i.e. its mass M sllb and position on the lens plane 
OCsub! J/sub). Finally the two models, i.e. the single power-law 
(PL) and the power-law plus PJ substructures (PL+PJ), are com- 
pared through their total marginalized Bayesian evidences (£), that 
represent the (conditional) probabilities of the data marginalized 
over all variable model parameters. 

2.2 Detection Threshold of Mass Substructure 

The method has a mass detection threshold to substructure that 
depends on the signal-to-noise ratio and spatial resolution of the 
lensed images; for typical HST (e.g. SLACS) data quality the mass 
detection threshold for a substructure located on the Einstein ring 
and with a pseudo-Jaffe density profile is of the order of a few times 
10 8 M o an d quickly increases with the distance from the lensed im- 
ages (see I Vegetti & Koopmansl 120093) because of the decrease in 
the image surface brightness and local magnification. 

Despite having been developed with the specific task of iden- 
tifying and constraining the fraction of substructure in lens galax- 
ies, this technique can also be used to model complex lens poten- 
tials, that are relatively smooth, but do not have the simple symme- 
tries that are often assumed in mas s models (e.g. elliptic al power- 
law density profiles). As shown in iBarnabe et al.l d2009h . we can 
also reconstruct the lensed images and the relative sources down 
to the noise level, even for systems that are highly asymmetric and 
strongly depart from a power-law density profile. The grid-based 
potential correction is able to correct the inexact initial choice of 
the lens potential model and recover existing asymmetries in the 
mass distribution. 

In the rest of this paper, we use this method to analyse the 
double Einstein ring system SLACS SDSSJ0946+1006 and search 
for deviations from a smooth power-law elliptical mass model. 



3 THE DATA 

In this section we present a brief overview of the double Ein- 
stein Ring lens sys tem SLACS SDSSJ0946+1006. We refer to 
iGavazzi et all d2008h for a more detailed description. 

SLACS selects gravitational lens candidates from the Sloan 
Digital Sky Survey spectroscopic database on the basis of multiple 
emission lines in the spectru m at redshifts large r than that of the 
lower redshift target galaxies dBolton et al1l2006h . The system was 
selected by the presence of multiple emission lines at z s i = 0.609 
in the spectrum of a lensing galaxy at zi = 0.222. Subsequently 
confirmed as a strong lens with ACS on board the HST, the system 



© 2002 RAS, MNRAS 000,[TV?? 



Detection of a Dark Substructure through Gravitational Imaging 3 



4 SMOOTH MASS MODELS 



Figure 1. The image of the lens system SDSSJ0946+1006, obtained with 
HST-ACS through the filter F814W, after subtraction of the lens surface 
brightness distribution. 



shows a very peculiar structure in the lensed images which are com- 
posed by two concentric partial rings, at radii of 1.43" ± 0.01" and 
2.07" ± 0.02", respectively, from the centre of the lens galaxy. This 
particular configuration is related to the presence of two sources at 
different re dshifts which are bei ng lensed by the same foreground 
galaxy (see lGavazzi et al.l ( l2008h for the a priori probability of this 
event in a survey such as SLACS); the nearest source is lensed into 
the inner ring (Ring 1), while the second one, further away along 
the optical axis, is lensed into the outermost ring (Ring 2). Ring 
1 with a F814W magnitude m, = 19.784 ± 0.006 is one of the 
brightest lensed sources in the SLACS sample, while Ring 2 with 
m 2 = 23.68 ± 0.09 is 36 times fainter. Ring 2 is not observed in 
the SDSS spectrum, and an upper limit to its reds hift z S 2 < 6.9 
was se t on the basis of ACS imaging. As inferred bv lGavazzi et al.l 
J2008I) the lens galaxy has a projected dark matter mass fraction 
inside the effective radius that is about twice the average valu e of 
the SLACS lenses dGavazzi et al .l2007l : lKoopmans et alj2006h . i.e. 
/dm (< ^eff) ~ 73% ± 9%, corresponding to a project dark matter 
mass approximately equal to Mom (< Reg) ~ 3.58 x 10" h^M . 

The high dark-matter mass fraction makes this system particu- 
larly interesting for CDM substructure studies. If the framework of 
galaxy formation given by N-body simulations is correct (substruc- 
ture mass function slope a = 1.90 and projected dark matter frac- 
tion in substructure / = 0.3%), we would expect, within an annulus 
of 0.6" center ed on the Einstein radius, on a verage \i = 6.46 ± 0.95 
substructures (jVegetti & Koopmans 2009b) wit h masse s between 
4x 10 6 M o and 4x 10 9 M^ l lDiemand et~a l. 2007a b. l2008h . Whereas 
these have typical masses a few times that of the lower limit on 
this range, the probability of finding a mass substructure above 
> 10 s M is certainly non-negligible. 



In this section we describe the details and the results of our analysis. 
Because of the very low surface brightness, and of the low signal- 
to-noise ratio of the images associated with Ring2, we limit our 
study to a tight annulus around Ring 1, in which Ring 2 has been 
fully excised. This does not affect the lens potential reconstruc- 
tion which is almost solely constrained by the detailed information 
given by the high surface brightness distribution of Ring 1. Our po- 
tential reconstruction therefore only probes the region around the 
inner ring. The choice of reconstructing the potential (if/) inside a 
limited field of view (e.g. mask), rather than the surface mass den- 
sity (k), ensures that the potential an d the resulting co nvergence re- 
constructions are both unbiased (see lKoopmansi2005l . for a detailed 
discussion about this subtle point). 



4.1 Image Reconstruction 

At the first level of reconstruction all potential corrections are kept 
at zero. We start by assuming that to first order the lens is well 
app roximated by a simple smooth elliptical power-law density pro- 
file jBarkanalll998l) with a convergence (surface density in terms of 
critical density E c ) 



k(r) ■■ 



(1) 



where r = ^jx 1 + y 2 /f 2 . The non-linear parameters describing the 
lens are: the lens strength a, the position angle 6, the flattening /, 
the centre coordinates x , the projected density slope q, the shear 
strength T s i, and the shear angle 8 s i,. We do not optimise for the 
mass centroid, but center on the peak of the surface brightness dis- 
tribution, as precisely determined from the HST image. We show 
in Section [631 that this assumption does not alter the main results 
of the paper, but reduces our substa ntial computational loa d. 

As described in more detail in lVegetti & Koopmans! J2009ah . 
the source grid is constructed from a (sub) sample of pixels in the 
image plane which are cast back to the source plane using the lens 
equation. The number of grid-points can be objectively set by com- 
paring their Bayesian evidence. In this particular case, we find that 
using all the image points (e.g. 81x81 pixels) is the most appropri- 
ate choice. On the image plane the pixel-scale is constant and equal 
to 0.05"/pixel, while on the source plane the Delaunay triangle- 
scale is adaptive and depends on the local lensing magnification. 
We adopt an adaptive curvature regularization, weighting the regu- 
larization penalty by the inverse of the image signal-to-noise ratio. 
We find that this significantly improves the modeling of sharp high 
dynamic range features in the lensed images, where in general all 
other forms of regularization (e.g. gradient or unweighted curva- 
ture) falter and give much lower ev idence values. 

We use the results obtained bv lGavazzi et alj ( f2008l ). for a sin- 
gle lens plane, as starting point 7/0 and then optimize for the po- 
tential parameters and the level of the source regularization. The 
resulting source and image reconstruction are presented in Fig. [2] 
In Table 1 the recovered lens parameters and level of source reg- 
ularization |// b ,/ij,b) are listed. The recovered parameters for the 
smooth mass compon ent of the lens potenti al are somewhat differ- 
ent from the results in Gav azzi et al .1 < l2008l) . which we attribute to 
the fact that lGavazzi et alj 12008!) makes us of both Einstein rings 
and matches conjugate points instead of the full surface brightness 
distribution. Some notable results for the smooth mass model are 
a density slope y' = (2q + 1) ss 2.20 and a mass axial ratio of 
/ = 0.96, indicating that the galaxy is very close to an isothermal 
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sphere mass model, although has a slightly steeper density profile. 
Th e quantity cannot be compared direclty with the slope measured 
bv lGavazzi et alj J2008t) becaus e we are measuring th e slope at the 
location of the inner ring, while lGavazzi et al.1 j2008l) is measuring 
the average slope in between the rings. 

4.2 Image Residuals after Reconstruction 

In [2] we clearly see remaining image residuals above the noise 
level, in particular near the upper-most arc feature. The source ap- 
pears to be a normal, although not completely symmetric, galaxy. 
Structure in the source (e.g. brightness peaks and the faint tail-like 
feature to the upper-right of the source) can also be one-to-one re- 
lated to structure in the arcs. This provides strong confidence the the 
overall reconstruction of the system, as being remarkably accurate 
despite its complexity. The source still shows significant structure 
on small scales, which is due to a preferred low level of regularisa- 
tion, when optimizing for the Bayesian evidence (note that at this 
level the evidence is simply the posterior probability of the free 
parameters, including the source regularisation). 

The image residuals can be related either to different aspects 
of the reconstruction procedure, for example to the modelling of 
the PSF, the choice of the simply-parametrized model for the lens 
potential, the number and scale of the image pixels, the lens-galaxy 
subtraction or features in the galaxy brightness profile. To test 
whether these residuals are related to the presence of substructures, 
however, we now first proceed by consider a more general model in 
which we allow for very general potential corrections (see above). 
We discuss the effects of systematic errors in a later Section, but 
stress that non of the above systematic errors are expected to mimic 
localized lensing features. 



5 THE DETECTION OF MASS SUBSTRUCTURE 

From the "Occam's razor" point of view, it is more proba- 
ble that uncorrelated structures in the lensed images are related 
to local small-scale perturbations in the lens potential, rather 
than features in the source distribution itself dKoopmansI 120051 ; 
IVegetti & Koopm ans 2009a). It is, therefore, possible to describe 
galaxy substructure or satellites as linear local perturbations to 
an overall smooth parametric potential and separate them from 
change s in the surface brightness distribution due to the source 
model l lKoopmansll2005h . Given that the remaining image residuals 
are small, we can assume that the values for the lens parameters 
recovered in the previous section are sufficiently close to the real 
smooth component of the lens potential such that our linearized 
reconstruction of the source and the grid-based lens-potential cor- 
rections are fully justified, as discussed in Section |2~T1 

5.1 Grid-based Substructure Modeling 

The potential corrections are defined on a regular Cartesian grid 
with 21 x 21 pixels. Both the source and the potential have a cur- 
vature regularization (in the case of the source inversely weighted 
by the local image signal-to-noise ratio) and are initially over- 
regularized in order to keep the potential corrections in the linear 
regime, where the formalism of the method is valid. The potential 
corrections are repeated (adding the previous correction to the cur- 
rent total potential), until convergence is reached in the evidence 
value. Results for this linear reconstruction are presented in Fig. [3] 



The potential correction and convergence show a clear sig- 
nature of a concentrated mass over-density (i.e. a substructure) ob- 
served around the position (-0.5, 1.0)". We have tested the effect of 
the potential-correction regularization on the stability of the recon- 
struction by using three different levels of regularization, in partic- 
ular A&p = 10 7 , Asp = 10 s and A 5l p = 10 9 . As expected, the conver- 
gence correction becomes smoother as the regularization increases; 
however the feature near (-0.5, 1.0)" in the convergence-correction 
map remains clearly visible in each reconstruction, together with a 
minor mass gradient from the low right side of the ring to the up left 
side. This gradient is associated with the presence of the substruc- 
ture itself (curvature regularization of the potential implies (a-) = 
in the annulus and thus neither the total mass nor average conver- 
gence gradient changes in the annulus; this is an advantage of the 
method). For the nearly under-regularised case with = 10 7 , 
the source is slightly twisted and the reconstruction becomes more 
noisy. This suggests that the potential allows for a minor amount of 
shear (we note that shear has k = everywhere and that is not pe- 
nalised by a curvature form of regularization), but that the substruc- 
ture, although noisier, is still present near the same position as in 
the other reconstruction. We therefore believe, given the data, that 
this feature is genuine. This statement, however, requires quantifi- 
cation. This is difficult at moment, based on the grid-based method, 
but can be done if the substructure is modeled through a simply 
parametrized mass component. 

5.2 Parameterized Substructure Modeling 

We quantify the mass of this substructure by assuming an analytic 
power-law (PL)+ substructure model. We assu me the structure to 
have tidally-truncated pseudo-Jaffe (PJ) profile jDalal & Kochanekl 
|2002|) with a convergence 

k(r)=^f[r- i -(r 2 + rjr^], (2) 

where r, is the substructure tidal radius and a SU b its lens strength; 
both are related to the main galaxy lens strength a and to M sub 
by r, = y/a^a and M sub = nr,a su \£ c . Combining the last two 
relations leaves its total mass and position on the lens plane as free 
parameters for the substructure model. Fig. |4(b)| shows the final 
result of the Bayesian evidence maximisation for both the main 
lens and substructure parameters. 

Remarkably, this procedure requires a substructure right at the 
position of the convergence overdensity found in the grid-based re- 
construction. In terms of Likelihood, the PL+PJ model is favoured 
with a |Alog_£| = +161.0 over the PL model (i.e. roughly com- 
parable to a Ax 2 ~ 2 A log X improvement). One might note, that 
the two models still seem to have similar levels of image residuals. 
This can be attributed to a significant difference in the source regu- 
larization. The smooth model, in order to fit the data, has to allow 
for more freedom to the source and has a lower level of source reg- 
ularization. Hence, part of the potential structure is "absorbed" in 
the source brightness distribution. To assess the level at which the 
source regularization contributes to the image residual level, we run 
a non-linear optimisation for the smooth model while keeping fixed 
the regularization constant at the level of the best PL+PJ model, we 
call this over-regularized model PLo. ovcr (see fig. |4(a)[ l. The Likeli- 
hood difference, between a perturbed model and a smooth one, is 
now further increased to |Alog£| = +183.0. Hence, indeed there 
is some covariance between the potential and source models. How- 
ever, no smooth potential model does as well as models that include 
the PJ substructure model, near the position found in the grid-based 
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Figure 2. Results of the lens and source reconstruction under the hypothesis of a smooth potential. The top-right panel shows the original lens data, while the 
top-left one shows the final reconstruction. On the second row the image residuals (left) and the source reconstruction (right) are shown. 



reconstruction. We are therefore convinced, based on the Likeli- 
hood ratio, that a PL+PJ model provides a much more probable 
explanation of the data, than a PL model with a more structured 
source model. 

Despite the difference in _£(//), at this stage of the modelling, 
it not possible to state whether the detection is statistically signifi- 
cant, because the effective number of degrees of freedom have not 
yet been accounted for. As shown in the next section, a nested- 
sampling exploration and marginalisation of the posterior probabil- 
ity density can be used to clarify this point and provide the Bayesian 
evidence values for the PL and PL+PJ models that can objectively 
be compared. 



6 ERROR ANALYSIS AND MODEL RANKING 

In this section we present the statistical analysis on the model 
parameters and the total marginalized evidence computation for 
model comparison. We are interested to test whether the lensed im- 



ages are compatible with a single smooth potential or whether the 
data indeed objectively require the presence of a mass substructure. 
We consider therefore two models, one defined by a smooth lens 
with a power-law density profile and one containing an additional 
mass substructure. In general, two models can only be objectively 
and quantitatively compared in terms of the total marginalized 
Bayesian evidence and the Bayes factor, AlogcS = log£ - log£[, 
which expresses their relative probability given a specific data-set. 

Heuristically the Bayesian evidence (£) can be compared to 
the classic reduced x 1 (i- e - P er degree of freedom), but without as- 
sumptions about Gaussianity of the posterior probability distribu- 
tion function about lack of covariance between parameters (which 
could reduce the effective number of degrees of freedom). 



6.1 Prior Probabilities 

Prior to the data-taking, little is known about the non-linear pa- 
rameters describing the lens potential model. A natural choice is 
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Figure 3. Results of the pixelized reconstruction of the source and lens potential corrections for three different value of the potential corrections regularization 
= 10 7 (top left panels), Am = 10 8 (top right panels) and Xgj, = 10 9 (low panels). The top-right panel shows the original lens data, the middle one shows 
final reconstruction while the top-left one shows the image residuals. On the second row the source reconstruction (left), the potential correction (middle) and 
the potential correction convergence (right) are shown. 
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Figure 4. Shown are the PL+PJ model (right panel) and for the PL model with the same source regularisation as the PL+PJ model (left panel). The residuals 
for the PL model are still subtle but have become more pronounced and that the source model also still has more structure, despite over-regularisation. Some 
of these residuals are reduced by lowering the source regularisation (see text), but the evidence difference between the two models remains large. 



therefore a uniform prior probability. We centre this prior on the 
best smooth values rj bi as recovered in Section |4~T1 although the 
choice of prior range is not very relevant as long as the likelihood 
is sharply peaked inside the prior volume: 



P(.Vi) 



constant for |;/b,i - ?7i| < Srji 
for |^ bi - > Siji. 



(3) 



Hence, the sizes of the intervals are taken in a such a way that they 
enclose the bulk of the evidence (i.e. likelihood times prior vol- 
ume). Exactly identical priors for r\ are used for both the smooth 
and perturbed model. Also in the latter case, the prior is centered 
on the mass model parameters of the smooth model. This ensures 
that we are comparing their evidences in a proper manner. The reg- 
ularization constant has a prior probability which is logarithmically 
flat in a symmetric interval around X s p. The mass substructure is 
assumed to have a pseudo-Jaffe density profile and a mass with a 
flat prior between M m j n = 4.0 x 10 6 M o and M raax = 4.0 x 10 9 M o 
jPiemand et"ai]|2007al lbl) and a position with a flat prior over the 
complete data grid. We note that our recovered mass, although 
close to the upper limit, is well inside this range (see below). We 
choose this range to make a comparison with simulations easier, 
but could have chosen a smaller or larger range. The results, how- 
ever, are similar (only the evidence is offset by a constant value for 
both the PJ and PL+PJ models). 

6.2 The Evidence and Posterior Probability Exploration 

One of the most efficient methods for exploring the posterior prob- 
ability within the framework of B ayesian s t atistic s is the nested- 
sampling technique developed by ISkillind b004l) . Although be- 
ing faster than thermodynamic integration, the nested sampling can 
still be computationally expensive as the overall computational cost 
rapidly grows with the dimension D of the problem as 0(D 3 /e 2 ), 



where e is the desired level of accuracy dChopin & Robertll2008h . 
Most of the nested-sampling computational effort is required by 
the simulations of points from a prior probability distribution n(jj) 
with the constraints that the relative likelihood £.{rf) has to be 
larger than an increasing threshold £.* . Different approaches have 
been su ggested in order to incre ase the performance of this sim- 
ulation. IChopin & Roberj J2008h . for example, proposed an ex- 
tension of the nested sa mpling, based on the pr inciple of im- 
portance sampling, while iMukheriee et al.l d2006l) developed an 
ellipsoidal nested sampling by appro ximating the iso-li kelihood 
contours by D-dimensional ellipsoids. IShaw et all l l2007t) . subse- 
quently improved the ellipsoidal nested sampling with a clusters 
nested sampling which allows efficient sampling also of multi- 
modal posterior distributions. 

In our analysis, we replace the standard Nested Sampling used 
in lVegetti & Koopmansl d2009j) with M ULTINEST, a mul t imoda l 
nested sampling algorit hm developed by Feroz & Hobsonl d2008l) . 
As further improved by iFeroz et al.l d2009h . MULTINEST allows 
to efficiently and robustly sample posterior probabilities even when 
the distributions are multimodal or affected by pronounced degen- 
eracies. The possibility of running the algorithm in parallel mode 
further reduces the computational load. 

The most appealing property of nested-sampling-based tech- 
niques is that they also efficiently explore the model parameter pos- 
terior probabilities and simultaneously compute the marginalized 
Bayesian evidence of the model. The former provide error determi- 
nations for the parameters of a given model, while the latter allows 
for a quantitative and objective comparison between different and 
not necessarily nested (i.e. one model is not necessarily a special 
case of the other) models. The Bayesian evidence automatically 
includes the Occam's razor and penalises models which are unnec- 
essarily complicated. This means that a PL+substructure model is 
preferred over a single PL only if the data require the presence of 
extra free parameters and the likelihood of the model increases suf- 
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ficiently to offset the decrease in prior probability (i.e. extra model 
parameters lead to a larger prior volume and hence a smaller prior 
probability density near the peak of the likelihood function). 

6.3 The Substructure Evidence and Model Parameters 

The main result of the Nested-Sampling analysis is that the PL+PJ 
model has a substructure with mean mass 

M sub = (3.51 ±0.15) x 1O 9 M , 

located at a position (-0.651 ± 0.038, 1.040 ± 0.034)" (see Table 
1); the quoted statistical errors do not take into account the system- 
atic uncertainties, but fully account for all covariance in the mass 
model. In our case, systematic errors are mostly related to the PSF 
and to the proc edure for the subtract ion of the lens galaxy surface 
brightness (see iMarshall et af]|2007l . for a discussion). Effects re- 
lated to systematic uncertainties are explored in Section 1631 Note 
that the results of this section are in agreement with those in the 
previous section and in particular that the substructure is exactly 
located where the positive convergence correction is found by the 
pixelized potential reconstruction (see fig- [3}. 

Finally, we find that, the perturbed PL+PJ model is strongly 
favoured by the data with Alog£ = logfipj - log£pL+pj = 
20353.90 - 20482.1 = -128.2. Heuristically, and ignoring the dif- 
ference in degrees of freedom between the PL and PL+PJ models, 
this would correspond in classical terms to more or less a dramatic 
A% 2 ~ 256 improvement in the model. Given that we have thou- 
sands of data pixels and no major residuals features this shows that 
adding only a few extra parameters to the lens model improves the 
agreement between the model and the data over a wide range of 
data pixels. Heuristically one might further estimate the substruc- 
ture mass error to be 6M suh ~ M su b/ V2|A£| ~ 0.2 x 10 9 M o , which 
is indeed close to the proper determination of this error. We are 
therefore confident about this detection and its strong statistical sig- 
nificance. This represents the first gravitational imaging detection 
of a dark substructure in a galaxy. 

However, to test the robustness of this detection (i.e. system- 
atics) we will now subject our reconstructions against several sub- 
stantial changes in the model and the data, some of these going far 
beyond what could be regarded as reasonable changes. 

6.4 The Substructure Mass-to-Light Ratio 

Based on the residual images, we determine an upper limit on the 
magnitude of the substructure in two different ways: by setting the 
limit equal to three times the estimated (cumulative) noise level 
or by aperture-flux fitting, both inside a 5x5 pixel (0.25"x0.25") 
window. The aperture is chosen to gather most of the light of the 
substructure, which is expected to be effectively point-like, given 
the typical size of the optical counterpart of galaxies of a 10 9 M o . 
They are in good agreement, because the image residuals are very 
close to the noise level. The 3-cr limit is found to be /f814W,3it > 
27.5 magn. At the redshift of the lens-galaxy this corresponds to a 
3-<x upper limit in luminosity of 5.0 x 10 6 Ly jS . Within the inner 
0.3 kpc and 0.6 kpc, we therefore find that the integrated mass is 
respectively (5.8±0.3)x 10 8 M o and (l.l±0.05)x 10 9 M G and hence 
lower limits of (M/L) Vo > 120 M /L v , (3-cr) and (M/L) Vo > 
218 M G /L V , (3-0-). 

The mass of the substructure is at the upper end of the mass 
function of Milky Way satellites (see Fig |7(a)| and Fig |7(b)fr . This 
is not surprising as the normalization of the mass function scales as 
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lines) and for a Gaussian prior in a (dashed lines). Contours (inside out) are 
set at levels A log(P) = -1,-4,-9 from the peak of the posterior probability 
density. 



the total mass of the host galaxy and SDSSJ0946+1006 is substan- 
tially more massive than the Milky Way at fixed radius (factor ~4). 
Moreover, if indeed gas is stripped from low-mass satellites though 
feedback and radiation in a strong star formation or starburst phase 
of the lens galaxy, during its formation, one might naturally expect 
that dwarf satellites that formed around or near massive early-type 
galaxies have larger M/L ratios than those in the Milky Way. How- 
ever, the total M/L upper limit is not far from those found for Milky 
Way satellites (e.g. lStrigari et al .1120081) (Fig |7(a)) . 

6.5 Robustness and Systematic Errors 

A number of major sources of systematic error might still affect 
the lens modelling: the PSF modeling, the pixel scale and lens 
galaxy subtraction from the lens plane. To determine at which level 
systematic errors influenced the substructure detection we tested 
the PL+PJ modelling (see Section [5} by rotating the PSF model 
through 90° from the original one; we call this model (PL + PJ) ps f9o- 
We also used a different data-set with smaller drizzled pixels 
(0.03") and a different lens galaxy subtraction procedure (using a 
Sersic profile rather than a b-spline surface brightness profile); we 
call this model (PL + PJ) su bt- We refer to the corresponding smooth 
models as PL psf c)o and PL sllbt , respectively. The results are shown in 
figs.|5^|5(b)]|^c)]and[5^)]and listed in Table 1. 

More precisely, (PL + PJ) su bt is not only rotated but also has 
a different pixel scale (0.03"/pixel), a different number of pixels, a 
different noise level, and a different PSF, so that we also test against 
all these changes. We also check whether the form of source reg- 
ularization has any effect by running a PL and a PL+PJ modelling 
for a non-adaptive regularization constant and for a gradient regu- 
larization. Finally we run an optimization for both the smooth and 
the perturbed model in which the centre of the lens is allowed to 
change and an optimization with a larger PSF. 
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Figure 5. Top Left panel: The over-regularised PL model with a rotated PSF. Top Right panel: The PL+PJ model with a rotated PSF. Bottom Left panel: 

The over-regularised PL model with a smaller pixel scale and a different procedure for the lens galaxy subtraction. Bottom Right panel: The PL+PJ model 
with a smaller pixel scale and a different procedure for the lens galaxy subtraction. 



All tests (see Table 1) lead to results that are consistent with 
each other for both the main lens and the substructure parameters. 
First we note that rotating the PSF changes the evidence by a value 
that could expected based on the sampling error in the nested sam- 
pling. Hence we conclude that PSF effects are negligible. In the 
case of the subt model, we note that we are no longer compar- 
ing the same data-sets and that the evidence values have dramat- 
ically changed. This simply reflects the large increase by a factor 
~ (0.05/0.03) 2 in the number of data-points. Bayesian evidence 
can not be used to compare different data-set, but we can compare 
the PL sll bt and (PL+PJ) sll b t models amongst each other. First, we 
remark that the pixel scale in this data-set is considerable smaller 
than the resolution and pixel-scale in the image, hence neitherthe 
data pixels nor their errors are fully independent. This leads to a 
rather odd stripped source reconstruction, not observed for the orig- 
inal data-set. Despite this difference, we notice that image residuals 
in the PL+PJ models are reduced, especially near the substructure 
positions, compared to the PL model. The Likelihood difference is 



A log _£ su bt = log Xpj - log -£pl+pj = - 154 in favor of the substruc- 
ture model. 

Over all, we are therefore confident that the substructure de- 
tection is not only a statistically sound detection, but also robust 
against dramatic changes in the model and the data. 



7 THE SUBSTRUCTURE MASS FUNCTION 

What does this imply for the ACDM model and the expected frac- 
tion of mass in substructure? Given th e statistical formalism pre- 
sented in IVegetti & Koopmansl J2009bh . we can use this detection 
to constrain the projected dark matter mass fraction in substructure 
/ and the substructure mass function slope dN/dm cc m~ a . We note 
that we can ignore the baryonic content in the substructure, because 
of its large total mass-to-light ratio. 

To make a proper comparison with simulations, we assume 
that the substructure mass can assume any value from M min = 4.0 x 
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Figure 7. Top panel: The integrated mass in units of solar masses, within the inner 0.3 kpc as a function of the total luminosity, in units of solar luminosity for 
the Milky Way satellites (black points) and the substructure detected in this paper (red arrow). Note the small error on the substructure mass. Bottom panel: 
the integrated mass-to-light ratio, within the inner 0.3 kpc as a function of the total luminosity, in units of solar luminosity for the Milky Way satellites (black 
points) and the substructure detected in this paper (red arrow). 



10 6 M o to M raax = 4.0 x 10 9 M o jDiemand et alj2007 j|bll200^) and 
that the mass we can detect varies from M tow = 0.15 x 10 s M a to 
Mhigh = M nrdx . We note that different limits would scale both the 
simulation and observed mass fraction in the same way. The mass 
fractions quoted throughout the papers are for this mass range only. 
We ignore the error on the measured substructure mass; this does 
not influence the results because our detection is well beyond the 
error a m = 0.15 x 10' M level. Given the quality of the fit for a 
PL+PJ model, we are confident that there are no other substructures 
with mass above our detection threshold. 

Figure [6] shows the joint posterior probability den- 
sity function P(a,f\n s = l,m = M m \>,p) contours and the 
marginalized probability densities P(f\n s = l,m = M su \,,p) and 
P(a | n s = l,m = M sllb ,p), given one detected substructure n s = 1 
with mass m = M su \,\ where p is a vector containing the model 
parameters, M mm , M mllx , M\ ow and Mhi a h- Specifically, from the ma- 
riginalized probability density distributions we find / = 2.56+] ^% 
and a = 1.36^^ at a 68% confidence level for a flat prior on a 
and / = 2.15+^% and a = 1.88*g;}° at a 68% confidence level 
for a Gaussian prior centred in 1.90 ±0.1. The same results are 
found if an error on the mass measurement and a detection thresh- 
old M\ ovl = 3 x cr,„ are assumed. 

As already discussed in IVegetti & Koopmans! J2009bh . while 
even a single lens system is enough to set upper and lower lim- 
its on the mass fraction, a larger number of lenses is required 
in order to constrain the mass function slope, unless a stringent 
prior information on the parameter it-self is adopted. By assuming 
a Gaussian prior of the mass function slope centred in 1.90, we 
can quantify the probability that the dark matter mass fraction is 
the on e g iven b y N-body simulations / « 0.3% jDiemand et al.l 
l2007al lbl | 2008), by considering ratio between the posterior 
probability densities P (/N-body, a = 1.9| n„ = \,m = M su b,p) and 
PmaxCf, at = 1-9 I n, = l,m = M sllh ,p) and find that this ratio is 
0.51. Hence, currently our measurement and that inferred from N- 
body simulations still agree as a result of the rather larger error-bar 
on the measured value of /. The combination of more lens systems 
is, of course, required to set more stringent constraints also on a. 
We plan such an analysis in forthcoming papers. 

Given our best value of / = 0.0215 for a = 1.9, we might ex- 
pect to detect ~ 1 mass substructures above our 3-cr mass-threshold 



of 4.5 x 10 s M B . It is therefore unlikely that we have missed many 
substructures with a mass slightly below that of our detection. 
Given this result and image residuals already at the noise level, we 
believe that adding a second substructure is not warranted and miss- 
ing lower-mass substructure leads only to a minor bias (note that 
logarithmic bins have nearly equal amounts of mass for a = 1.9). 



8 SUMMARY 

We have applied our new Bayesian and ada ptive-grid method for 
pixeliz ed source and lens-potential modeling dVegetti & Koopmans! 
l2009al) to the analysis of HST d ata of the double Ein stein ring sys- 
tem SLACS SDSSJ0946+1006 dGavazzi et ai1l2008h . This system 
was chosen based on its large expected dark-matter mass fraction 
near the Einstein radius and the high signal-to-noise ratio of the 
lensed images. Although these two facts should be uncorrelated to 
the mass fraction of CDM substructure, both incidences maximize 
the change of detection. 

We find that a smooth elliptical power-law model of the sys- 
tem leaves significant residuals near or above the noise level; these 
residuals are correlated and spread over a significant part of the 
lensed images. Through a careful modeling of this data including 
either lens-potential corrections or an additional (low-mass) simply 
parametrized mass component, we conclude that the massive early- 
type lens galaxy of SLACS SDSSJ0946+1006 hosts a large mass- 
to-light ratio substructure with a mass around M sllb ~ 3.5 x 10* M Q , 
situated on one of the lensed images. A careful statistical analysis 
of the image residuals, as well as a number of more drastic ro- 
bustness tests (e.g. changing the PSF, pixel number and scale, reg- 
ularization level and form, galaxy subtraction and image rotation), 
confirm and support this detection. Based on this detection, the first 
of its kind, we derive a projected CDM substructure mass fraction 
of ~ 2.2% for the inner regions of the g alaxy, using the Bayesian 
method of I Vegetti & Koopmans! d2009br) ; this fraction is high, but 
still consistent with expectations from numerical simulations due 
to the large (Poisson) error based on a single detection. 

The numerical details of our results can be further summarized 
as follows: 

(1) Using a Bayesian Multinest Markov-Chain exploration of 
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Table 1. Parameters of the mass model distribution for the lens SDSSJ0946+1006. For each parameter we report the best recovered value and the relative 
Likelihood for a smooth model (PL) in column (2), for a smooth over-regularized smooth model in column (3), for a perturbed model (PL+PJ) in column 
(4), for a smooth and perturbed model (PL+PJ) with rotated PSF respectively in columns (5) and (6) and a smooth and perturbed model (PL+PJ) for different 
galaxy subtraction respectively in columns (7) and (8). We note that the models in the final two columns use a different (also rotated) data set and the evidence 
values, position angles and positions can therefore not be directly compared. 
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the full model parameter space, we show that the identified object 
has a mass of M sub = (3.51 ± 0.15) x 10' M G (68% C.L.) and is 
located near the inner Einstein ring at (-0.651 ± 0.038, 1.040 ± 
0.034)". The Bayesian evidence is in favor a model that in- 
cludes a substructure versus a smooth elliptical power-law only, by 
Alog(fi) = - 128.2. This is roughly equivalent to a 16- cr detection. 

(2) At the redshift of the lens-galaxy a 3-cr upper limit in lu- 
minosity is found of 5.0 x 10 6 Ly Q , Within the inner 0.3 kpc and 
0.6 kpc, we find that the integrated mass is respectively (5.8 ± 
0.3) x 10 s M and (1.1 ± 0.05) x 10 9 M o and hence lower limits of 
(M/L) V G > 120 M /L v . (3-cr) and (M/L) V o > 218 M /L v . (3- 
cr). This is higher than of MW satellites, but maybe not unexpected 
for satellites near massive elliptical galaxies. 

(3) The CDM mass fraction and a mass function slope are equal 
to / = 2.15+f-g% and a = 1.88+° J°, respectively, at a 68% confi- 
dence level for a Gaussian prior on a centred on 1.90 ±0.1. For a 
flat prior on a between 1.0 and 3.0, we find / = 2.56 + j ?n%. Asking 
whether the / = 2. 15% is consistent with / = 0.3%, we find a like- 
lihood ratio of 0.5 1 ; indeed both are consistent. This is the result of 
the considerable measurement error found for /, because it is based 
on only a single detection. 

This is the first application of our adaptive "gravitational 
imaging" method to real data and clearly shows its promise. In 
the near future we will apply the method to a larger set of SLACS 
le nses in order to constr a in, via the statistical formalism presented 
in IVegetti & Koopmanj d2009bl) . the dark matter fraction in sub- 
structure and the substructure mass function. 
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